GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

نویسندگان

  • Felix Gremse
  • Andreas Höfter
  • Lars Ole Schwen
  • Fabian Kiessling
  • Uwe Naumann
چکیده

We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using sub-warps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation as well as with three GPU-based implementations. Measurements performed for computing the matrix square for 21 sparse matrices show that the proposed method consistently outperforms the other methods. Analysis showed that the performance is achieved by utilizing the compression effect and the GPU caching architecture. An improved performance was also found for computing Galerkin products which are required by algebraic multigrid solvers. The performance was particularly good for 7-point stencil matrices arising in the context of diffuse optical imaging and the improved performance allows to perform image reconstruction at higher resolution using the same computational resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid format for better performance of sparse matrix-vector multiplication on a GPU

In this paper, we present a new sparse matrix data format that leads to improved memory coalescing and more efficient sparse matrix-vector multiplication (SpMV) for a wide range of problems on high throughput architectures such as a graphics processing unit (GPU). The sparse matrix structure is constructed by sorting the rows based on the row length (defined as the number of non-zero elements i...

متن کامل

Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform

Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordere...

متن کامل

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix densematrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balanc...

متن کامل

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

We present new adaptive format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and vector is significantly faster for many ma...

متن کامل

On improving the performance of sparse matrix-vector multiplication

We analyze single-node performance of sparse matrix-vector multiplication by investigating issues of data locality and ne-grained parallelism. We examine the data-locality characteristics of the compressed-sparse-row representation and consider improvements in locality through matrix permutation. Motivated by potential improvements in ne-grained parallelism, we evaluate modiied sparse-matrix re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Scientific Computing

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2015